Crime Analysis

Contents

1. Reporting of the Data Sets

2. Analysis of the Data Sets

3. Visualisation of the Data Sets

4. Advanced Analysis/Visualisation

1. Reporting of the Data Sets

There were 18 raw data sets to import and append into one raw data file. Therefore, the below code from https://stackoverflow.com/questions/51321021/how-to-read-multiple-csv-files-in-a-directory-through-python-csv-function was adapted and re-used.

The fields 'Crime ID' and 'Context' are clearly not useful for this analysis. Hence, they are dropped using the below code.

It was required to re-arrange the data frames to carry out various computations and visualisations. Grouping by certain variables and aggregation were required to be done together. The below links were referred to when compiling the codes to perform these tasks. https://www.statology.org/pandas-groupby-aggregate-multiple-columns/ and https://jamesrledoux.com/code/group-by-aggregate-pandas

The re-arranged data frame was used to draw a line chart to visualise the trends in the crime totals over the periods for the two police areas. The weblink https://seaborn.pydata.org/tutorial/aesthetics.html was referred to when compiling the code.

Grouped bar charts were drawn below for each of the police areas by year grouped by crime type. Plotly.express was used as it produced better formatted charts than those drawn using matplotlib. Code reference: https://towardsdatascience.com/how-to-create-a-grouped-bar-chart-with-plotly-express-in-python-e2b64ed4abd7

2. Analysis of the Data Sets

Further to the previous analysis carried out, this section includes a comparison of the crime counts of the two police areas normalised for population. Here, a hypothesis is also tested.

Next, population estimate data obtained from Office for National Statistics website for 2019 and 2020 by filterering the geographies are imported.

Population of inner London exclusing City of London and outer London were taken for Metropolitan Police Service area. Population of Swansea, Neath Port Talbot, Rhondda Cynon Taff, Bridgend, Vale of Glamorgan, Cardiff, Newport, Monmouthshire, Torfaen, Blaenau Gwent, Caerphilly and Merthyr Tydfil were taken for South Wales. 2021 population was assumed to be the same as 2020 since the 2021 figures were unavailable.

Hypothesis Testing

The total crimes line chart by year for South Wales drawn before showed that summer months of 2019, 2020 and 2021 had similar crime counts. The below hypothesis was tested to check if the median monthly crime counts during summer months are equal in the South Wales Police area.

H0: The median monthly no. of crimes in South Wales Police area during all 3 summers are equal.
H1: The median monthly no. of crimes in South Wales Police area during the 3 summers are not equal.

To test this hypothesis the non-parametric test Kruskal-Wallis was used.

A code from https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ was adapted and re-used to conduct the Kruskal Wallis test.

3. Visualisation of the Data Sets

Although this is a dedicated section for the visualisations, many visualisations were done in reporting and analysing the data sets as well. In this section, pie charts would be the main focus.

As per previous illustrations, out of both the police areas' periods Metropolitan Police Service area recorded the highest number of crimes during the summer of 2020. Below is a pie chart depicting the percentage of crimes by type from the total number of crimes that happened in the area during summer of 2020. The web link https://waynestalk.com/en/python-pie-donut-sunburst-charts-en/ was referred to in compiling the below code.

The pies in this pie chart drawn for 2020 can be compared with the pies of the below pie charts drawn for 2019 and 2021. the below subplot of pie charts was drawn adapting a code from https://plotly.com/python/pie-charts/

4. Advanced Analysis/Visualisation